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Abstract 

Background: Nrdl and Nab3 are essential sequence-specific yeast RNA binding proteins that function as a heterodimer 
in the processing and degradation of diverse classes of RNAs. These proteins also regulate several mRNA coding genes; 
however, it remains unclear exactly what percentage of the mRNA component of the transcriptome these proteins 
control. To address this question, we used the pyCRAC software package developed in our laboratory to analyze 
CRAC and PAR-CUP data for Nrd1-Nab3-RNA interactions. 

Results: We generated high-resolution maps of Nrd1-Nab3-RNA interactions, from which we have uncovered hundreds 
of new Nrd1-Nab3 mRNA targets, representing between 20 and 30% of protein-coding transcripts. Although Nrdl and 
Nab3 showed a preference for binding near 5' ends of relatively short transcripts, they bound transcripts throughout 
coding sequences and 3' UTRs. Moreover, our data for Nrd1-Nab3 binding to 3' UTRs was consistent with a role for 
these proteins in the termination of transcription. Our data also support a tight integration of Nrd1-Nab3 with the 
nutrient response pathway. Finally, we provide experimental evidence for some of our predictions, using northern 
blot and RT-PCR assays. 

Conclusions: Collectively, our data support the notion that Nrdl and Nab3 function is tightly integrated with the 
nutrient response and indicate a role for these proteins in the regulation of many mRNA coding genes. Further, we 
provide evidence to support the hypothesis that Nrd1-Nab3 represents a failsafe termination mechanism in instances 
of readthrough transcription. 



Background 

RNA binding proteins play crucial roles in the synthesis, 
processing and degradation of RNA in a cell. To better 
understand the function of RNA binding proteins, it is 
important to identify their RNA substrates and the sites 
of interaction. This helps to better predict their function 
and lead to the design of more focused functional ana- 
lyses. Only recently, the development of cross-linking and 
immunoprecipitation (CLIP) and related techniques has 
made it possible to identify direct protein-RNA interac- 
tions in vivo at a very high resolution [1-5]. To isolate 
direct protein-RNA interactions, cells are UV irradiated 
to forge covalent bonds between the protein of interest 
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and bound RNAs. The target protein is subsequently 
affinity purified under stringent conditions, and UV cross- 
linked RNAs are partially digested, ligated to adapters, 
RT-PCR amplified and sequenced. CLIP methods are 
becoming increasingly popular and produce valuable 
data. The number of papers describing the technique 
seems to double every year and it is now being applied 
in a wide range of organisms. The method is also under 
constant development: the individual-nucleotide resolution 
CLIP (iCLIP) approach has improved the accuracy of map- 
ping cross-linking sites [2,4], and incorporating photo- 
activatable nucleotides in RNA can enhance the UV 
cross-linking efficiency [1]. We have recently developed 
a stringent affinity-tag-based CLIP protocol (cross-linking 
and cDNA analysis (CRAC)) that can provide a higher 
specificity [5], and the tag-based approach is becoming 
more widely adopted [4,6]. The combination of CLIP with 
high-throughput sequencing (for example, HITS-CLIP) 
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has markedly increased the sensitivity of the method- 
ology and provided an unparalleled capability to identify 
protein-RNA interactions transcriptome-wide [3,5,7]. 
This approach is producing a lot of extremely valuable 
high-throughput sequencing data. Fortunately, many bio- 
informatics tools are now becoming available tailored to 
tackle the large CR AC/CLIP datasets [8-11]. We have 
recently developed a python package, dubbed pyCRAC, 
that conveniently combines many popular CLIP/CRAC 
analysis methods in an easy to use package. 

Nrdl and Nab3 are essential sequence-specific yeast 
RNA binding proteins that function as a heterodimer in 
processing and degradation of diverse classes of RNAs 
[12-19]. Transcription termination of RNA polymerase 
(Pol) II transcripts generally involves mRNA cleavage and 
addition of long polyA tails (cleavage and polyadenylation 
(CPF) pathway), which label the RNA ready for nuclear 
export (reviewed in [20]). By contrast, transcripts termi- 
nated by Nrdl-Nab3 generally contain short polyA tails 
and are substrates for the nuclear RNA degradation ma- 
chinery [21,22]. This activity is also important for small 
nucleolar RNA (snoRNA) maturation and degradation 
of cryptic unstable transcripts (CUTs) and stable unan- 
notated transcripts (SUTs) [12,23-26]. Nrdl and Nab3 
direct transcription termination of nascent transcripts by 
interacting with the highly conserved carboxy-terminal 
domain (CTD) of RNA polymerase II. Because this inter- 
action requires phosphorylation at serine 5 in the CTD, 
Nrdl and Nab3 are believed to primarily operate on pro- 
moter proximal regions where serine 5 phosphorylation 
levels are high [27,28]. 

Recent high- throughput studies have indicated Nrdl 
and Nab3 frequently UV cross-link to mRNAs [6,24,29] 
and thousands of mRNA coding genes harbor Nrdl and 
Nab3 binding sequences (see below). However, thus far a 
relatively small number of mRNAs have been reported 
to be targeted by Nrdl and Nab3 [25,30-33]. Indeed, it is 
not clear exactly what percentage of the mRNA transcrip- 
tome these proteins control. To address this question, we 
reanalyzed CRAC and PAR-CLIP data using the pyCRAC 
software package. We generated high-resolution maps 
of Nrdl-Nab3-RNA interactions, focusing on the pres- 
ence of known RNA binding motifs in the sequencing 
data. We also confirmed some of our predictions experi- 
mentally. Our analyses revealed that Nrdl-Nab3 bound 
between 20 to 30% of protein- coding transcripts, several 
hundred of which had binding sites in untranslated regions 
(UTRs). Although Nrdl and Nab3 showed a preference for 
binding near 5 ' ends of relatively short transcripts, they 
bound transcripts throughout coding sequences and 
3 ' UTRs. Our data suggest that Nrdl-Nab3 can termin- 
ate transcription of a long approximately 5 kb transcript 
by binding 3' UTRs and we speculate that the fate of 
many mRNAs is dictated by kinetic competition between 



Nrdl-Nab3 and the CPF termination pathways. Statistical 
analyses revealed that Nrdl and Nab3 targets are signifi- 
cantly enriched for enzymes and permeases involved in 
nucleotide/amino acid synthesis and uptake, and for pro- 
teins involved in mitochondrial organization. Collectively, 
our data support the notion that Nrdl and Nab3 function 
is tightly integrated with the nutrient response [30] and 
indicate a role for these proteins in the regulation of many 
mRNA coding genes. 

Results and discussion 

Identification of Nrd1-Nab3 binding sites in PAR-CLIP data 

Previous genetic and biochemical studies have identified 
a number of short Nrdl and Nab3 RNA binding motifs 
(UCUU and CUUG in Nab3; UGUA and GUAG in Nrdl) 
[6,15,16,18,24,29]. Not surprisingly, almost every single 
mRNA coding gene in the yeast genome contains at 
least one copy of these motifs and could therefore be 
Nrdl and Nab3 targets (see below). To get an impression 
of how many mRNAs are actually targeted by Nrdl and 
Nab3 in yeast, we analyzed data from Nrdl and Nab3 
CLIP/CRAC experiments using the pyCRAC software 
package [34]. 

Two high-throughput protein-RNA cross-linking studies 
on Nrdl and Nab3 in yeast have recently been described 
using PAR-CLIP [6,29] and the CRAC method [24]. Both 
studies produced very similar results and indicated that 
Nrdl and Nab3 target RNAs generated by all three RNA 
polymerases. Here we focus on the PAR-CLIP data, as 
the number of uniquely mapped reads in these datasets 
was higher and allowed identification of a greater number 
of targets (data not shown). Figure 1 provides a schematic 
overview of how the read data were processed. All identi- 
cal read sequences were removed and only reads with 
unique chromosomal mapping positions were considered 
(Figure 1A,B). Negative control CLIP experiments often 
do not generate sufficient material for generating high 
quality cDNA libraries for sequencing. Because no con- 
trol PAR-CLIP samples were available, we calculated the 
minimum read coverage (or 'height') required to obtain 
a false discovery rate (FDR) of less than 0.01 for each 
annotated feature in the genome. Read contigs were 
generated from those regions with coverage higher than, 
or equal to, the minimum height (Figure 1C). We reasoned 
that this approach would reduce noise and sequence repre- 
sentation biases introduced by highly expressed genes. A 
potential drawback of this approach is that genes with high 
read coverage (such as tRNAs) are less likely to contain 
significantly enriched regions, leading to an underesti- 
mation of the number of binding sites in these genes. 

We next searched for overrepresented sequences in 
Nrdl and Nab3 read contigs (Figure IE). Consistent 
with recently published work [24,29], previously identi- 
fied Nrdl-Nab3 motifs were highly over-represented 
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Figure 1 Schematic overview of the read-processing steps used for our analyses. Shown is a schematic representation of a gene containing 
two exons and one intron. Each black line indicates a read and asterisks indicate positions of T-C substitutions. (A, B) The first step involved removal of 
all identical sequences in raw reads by collapsing the data (using pyFastqDuplicateRemover) and aligning the remaining cDNA sequences to 
the genome. (C) pyCalculateFDRs was used to calculate the minimum read coverage height required to obtain an FDR <0.01 . (D) Contigs were 
generated from significantly enriched regions and T-C mutation frequencies were calculated (using pyCalculateMutationFrequencies). (E, F) We 
then used pyMotif to identify Nrd1-Nab3 motifs in contigs (E), and selected only those motifs where we could find at least one T-C mutation in 
overlapping reads (F). These are referred to as 'cross-linked motifs' throughout the manuscript. 



(Table SI in Additional file 1). Additionally, the recently mutations in contigs generated from the Nrdl PAR-CLIP 

described AU-rich Nrdl motifs (UGUAA and UGUAAA) data were clearly enriched over Nrdl motifs, confirming 

[29,35] were among the top scoring 5- and 6-mers, re- that Nrdl has a strong preference for cross-linking to 

spectively. Because UV- induced cross-linking sites in these sites [6,24,29]. Sequence contigs generated from 

PAR-CLIP data are often highlighted by T-C substitu- the Nab3 data sets had high T-C mutation frequencies 

tions [1], we reasoned we could obtain higher confidence (Figure SIB in Additional file 2) and only a modest enrich- 

binding sites by focusing on motif sequences isolated from ment could be seen downstream of Nab3 motifs. This 

contigs that contained a T-C substitution in at least one result is in contrast with recent analyses performed on 

overlapping read (Figure 1D-F). All T-C substitutions in Nab3 CRAC data, where cross-linking sites were mainly 

reads were weighted equally and included as mutations detected within UCUU and CUUG sequences (Figure SIC 

in contigs (Figure ID). Additional file 2 shows that T-C in Additional file 2) [24]. This discrepancy could be, in 
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Figure 2 Comparison of predicted and identified binding sites. (A) Overview of the percentage (y-axis) of genes in genomic features (x-axis) 
that contain Nrdl (blue) or Nab3 (red) motifs in their sequence. (B) The percentage of genomic features that contained cross-linked Nrdl or 
Nab3 motifs. (C) The percentage of all Nrd 1 and Nab3 motifs in gene/feature sequences found in the PAR-CLIP data analyses. (D) The distribution 
of cross-linked motifs over UTR and exon sequences. ncRNA, non-coding RNA; snRNA, small nuclear RNA. 



part, the result of noise in the Nab3 PAR-CLIP data, as 
other short sequences were more highly enriched in Nab3 
contigs than the previously reported Nab3 binding sites 
(Table SI in Additional file 1). To reduce noise, we only 
selected Nab3 motifs containing T-C substitutions from 
contigs (Figure IF), hereafter referred to as cross-linked 
motifs'. Overall, our motif analyses are in excellent agree- 
ment with previously published work. 

At least a quarter of the mRNAs are Nrd1-Nab3 targets 

Figure 2A provides an overview of the percentage of 
genes in the genome that contain Nrdl (UGUA, GUAG) 
and Nab3 (UCUU, CUUG) motifs. The vast majority of 
motifs were found in protein coding genes and cryptic 
Pol II transcripts such as CUTs and SUTs. Although 
generally fewer motifs were present in short non-coding 
RNA genes (tRNAs, small nuclear RNAs (snRNAs) and 
snoRNAs; Figure 2A), a high percentage of these mo- 
tifs contained T-C substitutions in the PAR-CLIP data 
(Figure 2C). Many Nrdl and Nab3 motifs are located 
in snoRNA flanking regions, which were not included 
in our analyses. Therefore, the number provided here is an 
underestimation of the total snoRNA targets. Strikingly, 



the PAR-CLIP analyses showed that Nrdl and Nab3 cross- 
linked to 20 to 30% of the approximately 6,300 mRNA 
transcripts analyzed (Figure 2B), although only a relatively 
small fraction of all motifs present in the genomic 
sequence contained T-C substitutions (less than 5%; 
Figure 2C). Around 50% of the cross-linked motifs mapped 
to untranslated regions, with a preference for 5' UTRs 
(Figure 2D). Consistent with recently published data, 
our analyses identified the telomerase RNA (TLC1) as a 
Nrdl-Nab3 target [29,36]. Other non-coding RNA tar- 
gets included the RNase P RNA (RPR1), the signal rec- 
ognition particle RNA (SCR1) and ICR1. Collectively, 
our analyses uncovered over a thousand mRNAs that 
could be regulated by Nrdl and Nab3. 

Nrdl and Nab3 preferentially bind to 5' ends of a subset 
of mRNA transcripts 

To refine our analyses, we generated genome-wide cover- 
age plots for cross-linked Nrdl and Nab3 motifs and com- 
pared them to the distribution of the motifs present in the 
genome (Figure 3A). UTR and transcript lengths were 
normalized by dividing the sequences in an equal number 
of bins. For each bin we estimated the Nab3/Nrdl binding 
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Figure 3 Distribution of Nrdl and Nab3 motifs in protein coding regions. (A) Nrdl and Nab3 preferentially bind near 5' ends of mRNA 
transcripts. Shown are pyBinCollector coverage plots displaying the Nrdl and Nab3 motif distribution in the exons and UTRs of all non-intronic 
mRNAs. To normalize the gene lengths the exon sequences were divided in 1 30 bins and UTRs in 1 0 bins. Probabilities were calculated by dividing 
the density values for cross-linked motifs found in the PAR-CLIP data by the density values for all the motifs found in mRNA coding genes. 
(B) Heat map showing the distribution of cross-linked Nrdl and Nab3 motifs (blue) over individual protein coding genes. pyBinCollector was used to 
produce a distribution matrix of cross-linked motifs over individual protein coding sequences and the resulting output was k-means clustered using 
Cluster 3.0. (C) Distribution of cross-linked Nrdl and Nab3 motifs around stop codons and relative to the positions of polyadenylation sites. 
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probability by dividing the number of cross-linked motifs 
by the total number of motifs in that bin. To evaluate the 
quality of the coverage plots, we generated heat maps dis- 
playing the distribution of Nrdl and Nab3 motifs in indi- 
vidual protein coding genes (Figures 3B and 4). 

Both Nrdl and Nab3 are co-transcriptionally recruited to 
the Pol II CTD. Chromatin immunoprecipitation (ChIP) 
experiments have indicated a preference for Nrdl-Nab3 
binding near the 5' ends of protein coding genes [27,28,37]. 
Binding of Nrdl and Nab3 near the 5 ' end of transcripts 
can lead to premature transcription termination and it was 
proposed that this was a regulatory mechanism for down- 
regulating mRNA levels. Indeed, transcriptome-wide, the 
probability of finding cross-linked motifs was higher near 
the 5' end of protein coding genes (Figure 3A). However, 
the heat maps in Figure 3B show that the distribution of 
cross-linked motifs over mRNAs varied considerably, 
and indicated that a relatively small number of genes 
mostly contributed to the signal near 5 ' ends. K-means 
clustering of the pyBinCollector data revealed 308 tran- 
scripts where cross-linked Nrdl and/or Nab3 motifs 
concentrated near 5 ' ends (highlighted by a red-dotted 
line in Figures 3B and 4), primarily downstream of the 
transcription start site (TSS) (Figure 4). This group in- 
cluded previously described Nrdl-Nab3 targets, such as 
PCF11, URA8 and NRD1 (Figures 4 and 5 A) [6,25,29] 
and therefore may represent a group of genes that are 
regulated by Nrdl-Nab3-dependent premature transcrip- 
tion termination. Notably, this group also included nu- 
merous other genes required for mRNA 3 ' end formation 
as well as genes encoding turnover and export factors 
(Figures 4 and 5B; PAP2/TRF4, PTI1, REF2, DHH1, 
NAB2, TEX1, PTI1, NOTS). We speculate that Nrdl and 
Nab3 can regulate mRNA metabolism at many levels. 

Gene Ontology term analyses on this list of transcripts 
also revealed a significant enrichment of enzymes with 
oxidoreductase activity (almost 10%; P-value <0.02) and 
genes involved in cellular transport activities such as nitro- 
gen compounds (8.8%; P-value = 0.0069). These included 
genes involved in ergosterol biosynthesis (Figure 5C; 
ERG24, ERG3 and ERG4), nucleoporins (KAP114, KAP108/ 
SXM1, KAP121/PSE1, KAP142/MSN5), several nucleoside 
and amino acid permeases (FUR4, MEP3, MMP1, DIPS, 
CAM, FCY2 } BAP3; Figure 5D) and various other trans- 
porters (TPOl, TP03, TAT1, YCF1). 

Regulation of many genes involved in nucleotide bio- 
synthesis is dictated by nucleotide availability and involves 
selection of alternative TSSs (IMD2, URA2, URA8 and 
ADE12) [42-45]. When nucleotide levels are sufficient, 
transcription starts at upstream TSSs and the elongating 
polymerase reads through Nrdl-Nab3 binding sites. 
When Nrdl-Nab3 bind these transcripts they are tar- 
geted for degradation. Indeed, several of the transcripts 
that originate from alternative TSSs have been annotated 



as CUTs. For a number of genes we could also detect 
cross-linked motifs upstream of the TSSs. Interestingly, 
cryptic transcription (XUTs and/or CUTs) was detected 
just upstream of AIM44, CDC47/MCM7, DIPS, ERG24, 
EMI2, FCY2, FRE1, GPM2, IRA2, MIG2, MYOl, TIR2, 
TEX1, YOR3S2W and YGR269W [38,39] (red colored gene 
names in Figure 4), hinting that these genes could also be 
regulated via alternative start site selection. 

Collectively, these data are consistent with a role for 
Nrdl and Nab3 in the nutrient response pathway [30] 
and we speculate that Nrdl-Nab3-dependent premature 
termination is a more widely used mechanism for regu- 
lating mRNA levels than was previously anticipated [25]. 

Nrdl and Nab3 bind 3' UTRs of several hundred mRNAs 

Nrdl and Nab3 have been shown to regulate expression 
of mRNA transcripts by binding 3' UTRs. It was pro- 
posed that in cases where the polymerase fails to termin- 
ate at conventional polyadenylation sites, Nrdl and Nab3 
binding to 3' UTRs could act as a transcription termin- 
ation 'fail-safe' mechanism [32]. From our data we predict 
that this is likely a widely used mechanism to prevent Pol 
II from transcribing beyond normal transcription termin- 
ation sites. 

We identified a total of 373 transcripts (approximately 
6% of all protein coding genes analyzed) where cross - 
linked Nrdl and/or Nab3 motifs mapped to 3' UTRs 
(Table S2 in Additional file 1). Two examples are shown 
in Figure 5B,E. We identified several cross-linked Nrdl 
and Nab3 motifs downstream of the MSN1 and NAB2 
coding sequences. We speculate that these are examples 
of Tail-safe' termination, where Nrdl and Nab3 prevent 
read-through transcription into neighboring genes located 
on the same (TRF4) or opposite strand (RPS2). This 
arrangement of termination sites is reminiscent of the 
region downstream of RPL9B (Figure 5F), where the CPF 
and Nrdl-Nab3 termination machineries act in competi- 
tion [33]. Cross-linked Nrdl motifs also appeared enriched 
near the 3 ' ends of protein coding genes (Figure 5A,B). The 
Nrdl QUAG and GUAA motifs contain stop codons and 
we found that indeed a fraction of the cross-linked Nrdl 
motifs recovered from the PAR-CLIP data overlapped with 
stop-codons (Figure 5C). 

A role for Nrdl-Nab3-dependent 3' end processing 
of mRNA has also been described: the TIS11/CTH2 
mRNA is generated from approximately 1,800-nucleotide, 
3' extended precursors and binding of Nrdl and Nab3 
to 3 ' UTRs recruits the exosome that is responsible for 
trimming the extended RNAs [31]. Our analysis identi- 
fied 6 cross-linked Nrdl-Nab3 motifs within this 1,800 
CTH2 nucleotide region (Figure 6A) and we could find 
several other examples of genes with a similar organization 
of binding sites. One striking example was TRA1, a com- 
ponent of the SAGA and NuA4 histone acetyltransferase 
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Figure 4 Distribution of cross-linked Nrdl and Nab3 motifs around transcription start sites. The pileup on top of the heat maps indicates 
the cumulative distribution of cross-linked motifs within a 500-nucleotide window of transcription start sites. The heat map shows the distribution 
of cross-linked motifs (blue) within individual transcripts. The dashed line indicates the positions of transcription start sites. Red gene names 
indicate genes where cryptic transcription was detected upstream, whereas cyan colored gene names indicate transcripts previously shown 
to be regulated by Nrd1-Nab3-dependent transcription termination. 
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Figure 5 Nrdl and Nab3 binding to a selected number of 
protein-coding transcripts. (A-G) Shown are UCSC genome 
browser images for a number of genes predicted to be regulated by 
Nrd1-Nab3. Coverage of unique cDNAs from Nrdl, Nab3 and Pol II 
(Rpb2) PAR-CLIP data [6,29] on Watson (+) and Crick (-) strands is 
shown as black histograms. Locations of cross-linked Nrd1-Nab3 motifs 
(this work), annotated Xrn1 -sensitive unstable transcripts (XUTs), 
polyadenylation sites and UTRs [22,38-41] are included as rectangles. 
Genomic features located on the Watson (+) strand are indicated in 
red, whereas features on the Crick (-) strand are indicated in blue. 
'Selected intervals' indicate genomic regions with a read coverage 
FDR <0.01 . These were used for pyMotif analyses. 



complex (Figure 6B). Several Nrdl-Nab3 peaks and four 
cross -linked Nrdl motifs were identified downstream of 
the TRA1 coding sequence. Notably, the downstream 
regions of CTH2 and TRA1 overlap with transcripts 
annotated as anti-sense regulatory non-coding RNAs' 
(Xrnl -sensitive unstable transcripts (XUTs)) [46], raising 
the question of whether these XUTs are products of read- 
through transcription. 

Nrd1-Nab3 and mitochondrion organization 

The Corden laboratory recently demonstrated a role for 
Nrdl in mitochondrial DNA maintenance [30]. An nrdl- 
102 temperature-sensitive mutant showed a higher mito- 
chondrial DNA content and was synthetically lethal with 
an AIM37 deletion, a gene involved in mitochondrial in- 
heritance [30,47]. Remarkably, a statistically significant 
fraction of the cross-linked Nrdl and Nab3 motifs located 
in 3' UTRs mapped to genes involved in mitochondrial 
organization and maintenance (37 genes, P- value 0.011). 
These include those encoding the mitochondrial DNA 
binding protein (ILV5), the nuclear pore associated pro- 
tein (AIM4; Figure 5G), a large number of proteins that 
localize to the mitochondrial inner membrane (COX16, 
COX17, FCJ1, TIM12, TIM14/PAM18, TIM54, YLH47, 
YTA12, CYC2, COA3, OXA1) and several mitochondrial 
ribosomal proteins (NAM9, MRP 13, MRPL3, MRPL21, 
MRPL22 and MRPL38). Notably, cells lacking AIM4 show 
similar defects in mitochondrial biogenesis as an aim37b± 
strain [47]. 

Collectively, the data suggest that Nrdl and Nab3 play an 
important role in mitochondrial function and development. 

Nab3 is required for fail-safe termination of the convergent 
HHT1 and IPP1 genes 

To substantiate our results we analyzed expression 
levels of several genes that we predicted were regulated 
by Nrdl-Nab3 (Figure 7A). For these analyses we used 
strains in which the Nrdl and Nab3 genes were placed 
under the control of a galactose inducible/glucose re- 
pressive promoter (GAL/GLU; Figure 7B), allowing us 
to deplete these proteins by growing the cells in glucose- 
containing medium using well established conditions 
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Figure 6 Nrdl and Nab3 binding to CHT2, SLX4 and TRA1 transcripts. (A 7 B) Coverage of unique cDNAs from Nrdl, Nab3 and Pol II (Rpb2) 
PAR-CLIP data [6,29] on Watson (+) and Crick (-) strands is shown as black histograms. 'Selected intervals' indicates genomic regions with a read 
coverage FDR <0.01 used for pyMotif analyses. Locations of cross-linked Nrd1-Nab3 motifs (this work), annotated XUTs, CUTs, SUTs (if present), 
polyadenylation sites and UTRs [22,38-41] are included as rectangles. Genomic features located on the Watson (+) strand are indicated in red, 
whereas features on the Crick strand (-) are indicated in blue. 
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Figure 7 Nab3 is required to suppress cryptic transcriptional activities. (A) UCSC genome browser images of the region showing HHT1 and 
IPP1. 'Selected intervals' indicate genomic regions with a read coverage FDR =0.01 used for pyMotif analyses. See the legend to Figure 5 for 
additional details. Chromosomal positions of RT-PCR products and northern blot probes are also indicated. (B) Western blot displaying levels of 
3HA-tagged Nrd1 and Nab3 proteins before and after the shift to glucose. Experimental details are provided in the Materials and methods. 
Proteins were detected using horse radish-conjugated anti-HA antibodies (Santa Cruz). (C) Schematic representation of transcripts generated in 
the SNR13-TRS31 region of yeast chromosome IV (adapted from [13]). About 1 to 4% of the SNR13 transcripts were read-through transcripts in 
Nab3 and Nrd1 depleted cells, respectively. (D) Northern blot analysis of IPP1, HHT1, snR13 and U2 snRNA and 3' extended species. Shown are 
phosphoimager scans of a blot probed with various oligonucleotides (indicated on the left of each panel). U2 snRNA levels were used as a 
loading control. (E) Depletion of Nrd1 and/or Nab3 results in a reduction of HHT1 and IPP1 mRNA levels. The mRNA levels were quantified using 
the ADA software package and normalized to both the levels in the parental strain and the U2 snRNA. (F, G). Quantitative RT-PCR analysis of 
HHT1 and IPP1 transcription in coding sequences (exon) and downstream regions. Fold change in transcription downstream of these genes was 
calculated by normalizing the data of the downstream regions to the signals obtained for the exon region. Error bars indicate standard deviations 
(H) Detection of IPP1 read-through transcripts by end-point RT-PCR. The diagram indicates the regions amplified. The position of 3' extended 
products and exon fragments in the gel are indicated on the right of the gel image. 



[24]. Transcript levels were analyzed by northern 
blotting and/or RT-PCR (endpoint and quantitative; 
Figures 7 and 8). Consistent with previous work [13], 
northern blot analyses showed that depletion of Nrdl 
and/or Nab3 resulted in read-through transcription beyond 
the SNR13 gene through the TSR31 gene (Figure 7C,D). 
Under the depletion conditions used, between 1% (Nrdl- 
depleted) and 3.5% (Nab3-depleted) of the SNR13 RNAs 
were read- through transcripts (Figure 7C). 

The convergent HHT1 and IPP1 genes came to our 
attention because we identified a cross-linked Nab3 motif 
that mapped to a XUT located directly downstream of 
the HHT1 gene (Figure 7A). XUTs can silence expression 
of neighboring sense genes by modulating their chromatin 
state [46]; therefore, this XUT could play a role in regulat- 
ing IPP1 expression. In addition, substantial Nab3 cross- 
linking was also observed to anti-sense HHT1 transcripts 
(Figure 7A). We predicted that Nab3 was required to 



suppress multiple cryptic transcriptional activities in 
this region. 

Quantification of the northern data shown in Figure 7D 
revealed a two- to four-fold reduction in HHT1 and 
IPP1 mRNA levels in the absence of Nrdl and/or Nab3 
(Figure 7E). These results indicate a role for Nrdl and 
Nab3 in regulating mRNA levels of these genes. 

We were unable to detect the XUT by northern blot- 
ting, presumably because it is rapidly degraded by RNA 
surveillance machineries (using oligo 3; Figure 7A; data 
not shown). However, quantitative RT-PCR (qRT-PCR) 
results showed a staggering approximately 25-fold increase 
in XUT levels in the absence of Nab3 (Figure 7F), clearly 
demonstrating a role for Nab3 in suppressing the expres- 
sion of this XUT. The Pol II PAR-CLIP data revealed tran- 
scription downstream of the IPP1 polyadenylation signals 
(Figure 7A), indicating that a fraction of polymerases 
did not terminate at these sites. Depletion of Nab3 resulted 
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Figure 8 Nrdl and Nab3 can terminate transcription of long transcripts by binding to 3' UTRs. (A, B) Nrd1 and Nab3 preferentially bind 
transcripts approximately <1 kb. The histogram in (A) shows the length distribution (including UTRs) of transcripts bound by Nrdl and Nab3 in 
the 3' UTR. Only transcripts where cross-linked motifs mapped to the 3' UTR were selected. The bracket indicates the percentage of transcripts 
longer than 782 nucleotides. The boxplot in (B) shows a comparison of the length distribution of the transcripts in (A) with the length distribution of 
all non-intronic protein coding genes in yeast. The P-value was calculated using a two-sample Kolmogorov-Smirnov test and indicates the 
likelihood that the two samples originate from the same continuous distribution. (C, D) UCSC genome browser images of YTA7 region. 'Selected 
intervals' indicate genomic regions with a read coverage FDR <0.01 used for pyMotif analyses. The Pol II serine phosphorylation ChIP data 
were obtained from [37]. See the legend to Figure 5 for more details. Chromosomal positions of RT-PCR products are indicated below the 
YTA7 gene. The Nab3 and Nrdl motifs in the approximately 100 bp region downstream of YTA7 are indicated in cyan and red, respectively. 
(E). Quantitative-RT-PCR results for YTA7 coding sequence (exon) and downstream region. Error bars indicate standard deviations. 



in an approximately six-fold increase in transcription 
downstream of the annotated IPP1 polyadenylation sites 
(Figure 7G) and low levels of IPP1 read-through tran- 
scripts could be detected by northern blotting and end- 
point RT-PCR (Figure 7D,H). We conclude that here 
Nab3 functions as a 'fail-safe' terminator by preventing 
the polymerase from transcribing beyond the IPP1 poly- 
adenylation sites into the HHT1 gene. Consistent with 
the low level of Nrdl cross-linking in this region, Nrdl 
depletion only modestly increased the XUT levels and 
no significant increase in read-through transcription of 
IPP1 could be detected (Figure 7A,D,G). These data in- 
dicate a role for Nab3 in fail-safe termination of IPP1 



and suppressing XUT expression, which may interfere 
with transcription of genes on the opposite strand. 

Nrd1-Nab3-dependent transcription termination of long 
mRNA transcripts 

The level of serine 5 phosphorylated CTD gradually de- 
creases during transcription of coding sequences, and it 
has been shown that Nrdl -dependent transcription ter- 
mination becomes less efficient once approximately 900 
nucleotides have been transcribed [27,28]. Almost half 
of the transcripts bound by both Nrdl and Nab3 in the 
3 ' UTR were longer than approximately 800 nucleotides 
(Figure 8A). However, compared to the length distribution 
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of all the analyzed protein coding genes, both proteins did 
preferentially cross-link to transcripts smaller than 1 kb 
(Figure 8B). To determine whether Nrdl-Nab3 can termin- 
ate transcripts longer than 1 kb, we monitored transcription 
of the approximately 4.7 kb YTA7 gene in Nrdl-Nab3 
depleted cells. The YTA 7 transcript was selected because 
significant cross-linking of Nrdl and Nab3 was detected 
mainly in the 3' UTR. Notably, contrary to the IPP1 
transcript, Nrdl-Nab3 cross-linked primarily upstream 
of polyadenylation sites, indicating that Nrdl-Nab3 
termination could precede CPF-dependent termination 
(Figure 8C,D). The strength of Nrdl-Nab3-dependent 
transcription termination depends on at least three fac- 
tors: (1) the number of clustered Nrdl-Nab3 motifs in a 
sequence, (2) the organization of the binding sites and 
(3) the presence of AU-rich sequences surrounding the 
binding sites [16,35]. Three Nab3 motifs were located 
within 70 nucleotides of the cross-linked Nrdl motif in 
the 3' UTR of YTA7, which were surrounded by AU-rich 
polyadenylation sequences (Figure 8D). This indicates 
that this region has the required signals for Nrdl-Nab3- 
directed transcription termination. To address this, we 
performed qRT-PCR with oligonucleotides that amplify 
sequences downstream of the YTA7 3' UTR. We also 
measured YTA7 mRNA levels by using oligonucleotides 
that amplify a fragment of the YTA7 exon (Figure 8E). 
The results show that depletion of Nrdl and/or Nab3 
led to an increase in transcription downstream of the 
YTA7 3' UTR (Figure 8E), indicating read through. How- 
ever, we can not exclude the possibility that these tran- 
scripts represent different isoforms of the same gene [48] . 
As with IPP1, depletion of Nab3 had by far the strongest 
effect (Figure 8E). Strikingly, we could also detect two- to 
four- fold increase in YTA7 mRNA levels in the absence 
of these proteins. This suggests that, by default, a sig- 
nificant fraction of YTA7 is degraded via the Nrdl-Nab3 
termination pathway. 

Genome-wide ChIP data had indicated that Nrdl bind- 
ing correlated with serine 7 phosphorylation of the Pol II 
CTD, whereas recruitment of factors required for con- 
ventional CPF pathway correlated with serine 2 phosphor- 
ylation [37]. Both serine 7 and serine 2 phosphorylation 
peaked in the 3' UTR of YTA 7 (Figure 8C) [37], indicating 
that both the Nrdl-Nab3 and CPF termination pathways 
are active in this region. This organization of termination 
signals is frequently found in cryptic transcripts (CUTs) 
[35], many of which are downregulated via the Nrdl- 
Nab3 pathway. It appears that a similar mechanism is 
used to regulate YTA7 mRNA levels and our bioinfor- 
matics analyses suggest that several hundred genes could 
be regulated in this way; we are currently investigating 
this in more detail. Transcrip tome-wide, the Nrdl-Nab3 
UV cross-linking profiles change when cells are starved 
of glucose [6]. It is conceivable, therefore, that the 



expression levels of these genes are dictated by the 
nutrient availability. 

Conclusions 

We have presented a comprehensive analysis of Nrdl and 
Nab3 PAR-CLIP datasets using the pyCRAC tool suite. We 
have uncovered more than a thousand potential Nrdl- 
Nab3 mRNA targets and our data indicate that Nrdl-Nab3 
play an important role in the nutrient response and mito- 
chondrial function. We have also provided valuable bio- 
logical insights into regulation of mRNA transcription 
by the Nrdl-Nab3 termination pathway. Our data sup- 
port a role for Nab3 in 'fail-safe' termination and regula- 
tion of XUT expression. Moreover, we demonstrate that 
Nrdl-Nab3 can terminate transcription of long transcripts 
and downregulate mRNA levels by binding to 3 ' UTRs. 
We speculate that at least several hundreds of genes are 
regulated in this way. We are confident that the analyses 
presented here will be a useful resource for groups work- 
ing on transcription termination. 

Materials and methods 

pyCRAC software 

The data described here were generated using pyCRAC 
version 1.1, which can be downloaded from [34]. The 
Galaxy version is available on the Galaxy tool-shed at 
[49] and requires pyCRAC to be installed in the /usr/ 
local/bin/ directory. 

Sequence and feature files 

All Gene Transfer Format (GTF) annotation and genomic 
sequence files were obtained from ENSEMBL. Genomic 
coordinates for annotated CUTs, SUTs, TSSs, polyadenyla- 
tion sites and UTRs were obtained from the Saccharomyces 
Genome Database (SGD) [22,38-41]. To visualize the 
data in the UCSC genome browser the pyGTF2bed and 
pyGTF2bedGraph tools were used to convert pyCRAC 
GTF output files to a UCSC compatible bed format. 

Raw data processing and reference sequence alignment 

Nrdl, Nab3 and Pol II (Rpb2) PAR-CLIP datasets 
were downloaded from the Gene Expression Omnibus 
(GEO) database (GSM791764, Nrdl; GDM791765, Rpb2; 
GSM791767; Nab3). The fastx_toolkit [50] was used to 
remove low quality reads, read artifacts and adapter 
sequences from fastq files. Duplicate reads were removed 
using the pyCRAC pyFastqDuplicateRemover tool. Reads 
were mapped to the 2008 S. cerevisiae genome (version 
EF2.59) using novoalign version 2.07 [51] and only cDNAs 
that mapped to a single genomic location were considered. 

Counting overlap with genomic features 

PyReadCounters was used to calculate overlap between 
aligned cDNAs and yeast genomic features. To simplify 
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the analyses, we excluded intron-containing mRNAs. 
UTR coordinates were obtained from the Saccharomyces 
Genome Database (SGD) [40,52]. The yeast genome ver- 
sion EF2.59 genomic feature file (2008; ENSEMBL) was 
used for all the analyses described here. 

Calculation of motif false discovery rates 

The pyCalculateFDRs script uses a modified version of a 
FDR algorithm implemented in Pyicos [9]. For a detailed 
explanation of how the algorithm works, please see the 
pyCRAC documentation. Reads overlapping a gene or 
genomic feature were randomly distributed a hundred 
times over the gene sequence and FDRs were calculated 
by dividing the probability of finding a region in the 
PAR- CLIP data with the same coverage by the probability 
of finding the same coverage in the gene in the random- 
ized data. We only selected regions with an FDR <0.01. 

Motif analyses 

The motif analyses were performed using the pyMotif 
tool from the pyCRAC suite. To indicate overrepresenta- 
tion of a k-mer sequence in the experimental data, 
pyMotif calculates Z-scores for each k-mer, defined as 
the number of standard deviations by which an actual 
k-mer count minus the k-mer count from random data 
exceeds zero. K-mers were extracted from contigs that 
mapped sense or anti-sense to yeast genomic features. 
Repetitive sequences in reads or clusters were only counted 
once to remove biases towards homopolymeric sequences. 
Bedtools was used to extract motifs that overlap with 
genomic features such as exons and UTRs and plots 
were generated using Gnuplot. The EMBOSS tool fuzz- 
nuc was used to extract genomic coordinates for all pos- 
sible Nrdl and Nab3 binding and the output files were 
converted to the GTF format. 

Generation of genome-wide coverage plots 

PyBinCollector was used to generate the coverage plots. 
To normalize the gene lengths, the tool divided the gene 
sequences over an equal number of bins. For each read, 
cluster (and their mutations), it calculated the number 
of nucleotides that map to each bin (referred to as nucleo- 
tide densities). To plot the distribution of T-C mutations 
over the 4 nucleotide Nrdl-Nab3 RNA binding motifs, we 
added 50 nucleotides up- and downstream of genomic 
coordinates for each identified motif, and divided these 
into 104 bins, yielding one nucleotide per bin and the 
motif start at bin 51. We then calculated the number of 
T-C substitutions that map to each bin and divided the 
number by the total number of Ts in each bin, yielding 
T-C substitution percentages. To plot the distribution 
of cross-linked motifs around TSSs, we included 500 
nucleotides up- and downstream of the start sites and 
divided these into 1,001 bins, yielding one nucleotide 



per bin. To generate the heat maps shown in Figures 3 
and 4, we used the -outputall flag in pyBinCollector. 
The resulting data were K-means clustered using Cluster 
3.0 [53] . Heat maps were generated using TreeView [54] . 

Western and northern blot analyses 

Western blot analyses and genetic depletion of Nrdl-Nab3 
using GAL::3HA strains were performed as previously de- 
scribed [24]. Briefly, cells were grown in YPGalRaf (2% gal- 
actose, 2% raffinose) to an OD600 of approximately 0.5 and 
shifted to YPD medium (2% glucose) for 9 (GAL::3HA- 
nrdl/GAL::3HA-nab3), 10 (GAL::3HA-nrdl) or 12 hours 
(GAL::3HA-nab3). Total RNA extraction was performed 
as previously described [55]. Northern blotting analyses 
were performed using ULTRAhyb-Oligo according to the 
manufacturers procedures (Ambion Austin, TX, USA). 
Oligonucleotides used in this study are listed in Table S3 
in Additional file 1. Nrdl and Nab3 proteins were de- 
tected using horse radish-conjugated anti-HA antibodies 
(Santa Cruz, Dallas, TX, USA; 1:5,000) 

RT-PCR analyses 

The oligonucleotide primers used for the RT-PCR ana- 
lyses are listed in Table S3 in Additional file 1. Total RNA 
was treated with DNase I (Ambion) according to the manu- 
facturer s instructions. For the qRT-PCR analyses, RNA was 
reverse-transcribed and amplified using qScript One-Step 
SYBR Green qRT-PCR (Quanta Bioscience, Gaithersburg, 
MD, USA), performed on a Roche LightCycler 480 accord- 
ing to the manufacturer s instructions (Roche, Burgess Hill, 
UK). Each reaction contained 50 ng template RNA and 
250 nM gene-specific primers. Thermal cycling conditions 
were composed of 50°C for 5 minutes, 95°C for 2 minutes, 
followed by 40 cycles of 95°C for 3 s, 60°C for 30 s. Appro- 
priate no-RT and no-template controls were included in 
each assay, and a dissociation analysis was performed to 
test assay specificity. Relative quantification in gene ex- 
pression was calculated using the Roche LightCycler 480 
Software. YTA7 levels were normalized to the levels of 
the PPM2 transcript (NM_001 18395) where no significant 
cross-linking of Nrdl and Nab3 was detected. For the 
end-point RT-PCR reactions, 100 ng of total RNA was re- 
verse transcribed using Superscript III at 50°C according 
to the manufacturers instructions (Invitrogen, Paisley, UK) 
and 2 uM of IPP1 reverse primer. The PCR included 200 
nM of forward primers. Thermal cycling conditions were 
35 cycles of: 95°C for 30 s, 60°C for 30 s and then 72°C for 
1 minute. 

Additional files 



Additional file 1: Table SI. PyMotif identified previously described 
Nrd1-Nab3 RNA binding motifs. Shown are overrepresented 4- to 6-mers 
(k-mers) and corresponding Z-scores that were isolated from the Nrdl (1 1,964) 
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and Nab3 (18,222) intervals with an FDR value of <0.01. Previously identified 
Nrd1 and Nab3 motifs are highlighted in red and blue, respectively. Table S2. 
Transcripts that contained cross-linked Nrd1-Nab3 motifs in 3' UTRs. Table S3. 
Oligonucleotides used in this study. 

Additional file 2: Figure SI. Mutations are enriched in and around 
Nrd1 and Nab3 RNA binding motifs. (A, B) Analysis of T-C mutations 
found in the PAR-CLIP data near Nab3 and Nrd1 RNA binding motifs. 
(C) Analysis of deletions found in the Nab3 CRAC data around the 
Nab3 motifs [24], pyBinCollector was used to calculate the coverage of T-C 
mutations or deletions in read contigs (see main text) within a 50-nucleotide 
window over Nab3 (CUUG, UCUU) (B, C) and Nrd1 (UGUA, GUAG) (A) motifs 
identified in the genome. To calculate T-C conversion percentages, the 
number of T-C substitutions was divided by the total number of Ts at 
each position. The asterisks indicate the positions in the motif where 
most frequently T-C substitutions were found in the Nrd1-Nab3 motifs. 
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